Dataset statistics
| Number of variables | 22 |
|---|---|
| Number of observations | 73,908 |
| Missing cells | 288,306 |
| Missing cells (%) | 17.7% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 12.4 MiB |
| Average record size in memory | 176.0 B |
Variable types
| Numeric | 11 |
|---|---|
| Categorical | 7 |
| DateTime | 2 |
| Boolean | 1 |
| Unsupported | 1 |
df_index is highly correlated with fare_amount and 2 other fields | High correlation |
RatecodeID is highly correlated with mta_tax and 1 other fields | High correlation |
trip_distance is highly correlated with fare_amount and 2 other fields | High correlation |
fare_amount is highly correlated with df_index and 4 other fields | High correlation |
mta_tax is highly correlated with df_index and 4 other fields | High correlation |
tip_amount is highly correlated with payment_type | High correlation |
total_amount is highly correlated with df_index and 4 other fields | High correlation |
payment_type is highly correlated with tip_amount | High correlation |
trip_type is highly correlated with RatecodeID and 1 other fields | High correlation |
duration is highly correlated with trip_distance and 2 other fields | High correlation |
df_index is highly correlated with mta_tax | High correlation |
RatecodeID is highly correlated with mta_tax and 1 other fields | High correlation |
fare_amount is highly correlated with total_amount and 1 other fields | High correlation |
mta_tax is highly correlated with df_index and 2 other fields | High correlation |
tip_amount is highly correlated with payment_type | High correlation |
tolls_amount is highly correlated with total_amount | High correlation |
total_amount is highly correlated with fare_amount and 2 other fields | High correlation |
payment_type is highly correlated with tip_amount | High correlation |
trip_type is highly correlated with RatecodeID and 1 other fields | High correlation |
duration is highly correlated with fare_amount and 1 other fields | High correlation |
df_index is highly correlated with mta_tax | High correlation |
RatecodeID is highly correlated with mta_tax and 1 other fields | High correlation |
trip_distance is highly correlated with fare_amount and 2 other fields | High correlation |
fare_amount is highly correlated with trip_distance and 2 other fields | High correlation |
mta_tax is highly correlated with df_index and 2 other fields | High correlation |
tip_amount is highly correlated with payment_type | High correlation |
total_amount is highly correlated with trip_distance and 2 other fields | High correlation |
payment_type is highly correlated with tip_amount | High correlation |
trip_type is highly correlated with RatecodeID and 1 other fields | High correlation |
duration is highly correlated with trip_distance and 2 other fields | High correlation |
RatecodeID is highly correlated with mta_tax and 1 other fields | High correlation |
mta_tax is highly correlated with RatecodeID and 2 other fields | High correlation |
improvement_surcharge is highly correlated with mta_tax | High correlation |
trip_type is highly correlated with RatecodeID and 1 other fields | High correlation |
df_index is highly correlated with extra and 2 other fields | High correlation |
RatecodeID is highly correlated with mta_tax and 1 other fields | High correlation |
fare_amount is highly correlated with mta_tax and 2 other fields | High correlation |
extra is highly correlated with df_index and 2 other fields | High correlation |
mta_tax is highly correlated with df_index and 6 other fields | High correlation |
tip_amount is highly correlated with df_index | High correlation |
improvement_surcharge is highly correlated with fare_amount and 3 other fields | High correlation |
total_amount is highly correlated with fare_amount and 2 other fields | High correlation |
trip_type is highly correlated with RatecodeID and 1 other fields | High correlation |
store_and_fwd_flag has 35733 (48.3%) missing values | Missing |
RatecodeID has 35733 (48.3%) missing values | Missing |
passenger_count has 35733 (48.3%) missing values | Missing |
ehail_fee has 73908 (100.0%) missing values | Missing |
payment_type has 35733 (48.3%) missing values | Missing |
trip_type has 35733 (48.3%) missing values | Missing |
congestion_surcharge has 35733 (48.3%) missing values | Missing |
trip_distance is highly skewed (γ1 = 72.23375746) | Skewed |
df_index has unique values | Unique |
ehail_fee is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
trip_distance has 1626 (2.2%) zeros | Zeros |
extra has 45658 (61.8%) zeros | Zeros |
tip_amount has 33957 (45.9%) zeros | Zeros |
tolls_amount has 67701 (91.6%) zeros | Zeros |
Reproduction
| Analysis started | 2022-07-28 08:53:58.753562 |
|---|---|
| Analysis finished | 2022-07-28 08:54:15.622245 |
| Duration | 16.87 seconds |
| Software version | pandas-profiling v3.2.0 |
| Download configuration | config.json |
df_index
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONUNIQUE| Distinct | 73908 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 38742.87852 |
| Minimum | 0 |
|---|---|
| Maximum | 76517 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 577.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 3933.35 |
| Q1 | 19621.75 |
| median | 39181.5 |
| Q3 | 57849.25 |
| 95-th percentile | 72778.65 |
| Maximum | 76517 |
| Range | 76517 |
| Interquartile range (IQR) | 38227.5 |
Descriptive statistics
| Standard deviation | 22076.00452 |
|---|---|
| Coefficient of variation (CV) | 0.5698080619 |
| Kurtosis | -1.198774571 |
| Mean | 38742.87852 |
| Median Absolute Deviation (MAD) | 19109 |
| Skewness | -0.03185112273 |
| Sum | 2863408666 |
| Variance | 487349975.8 |
| Monotonicity | Strictly increasing |
| Value | Count | Frequency (%) |
| 0 | 1 | < 0.1% |
| 51730 | 1 | < 0.1% |
| 51638 | 1 | < 0.1% |
| 51637 | 1 | < 0.1% |
| 51636 | 1 | < 0.1% |
| 51635 | 1 | < 0.1% |
| 51634 | 1 | < 0.1% |
| 51633 | 1 | < 0.1% |
| 51631 | 1 | < 0.1% |
| 51630 | 1 | < 0.1% |
| Other values (73898) | 73898 |
| Value | Count | Frequency (%) |
| 0 | 1 | |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 7 | 1 | |
| 9 | 1 | |
| 10 | 1 | |
| 11 | 1 | |
| 12 | 1 | |
| 13 | 1 |
| Value | Count | Frequency (%) |
| 76517 | 1 | |
| 76516 | 1 | |
| 76515 | 1 | |
| 76514 | 1 | |
| 76513 | 1 | |
| 76512 | 1 | |
| 76511 | 1 | |
| 76510 | 1 | |
| 76509 | 1 | |
| 76508 | 1 |
VendorID
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 577.5 KiB |
| 2 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 73,908 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2 |
|---|---|
| 2nd row | 2 |
| 3rd row | 2 |
| 4th row | 2 |
| 5th row | 2 |
Common Values
| Value | Count | Frequency (%) |
| 2 | 66922 | |
| 1 | 6986 | 9.5% |
Length
Category Frequency Plot
| Value | Count | Frequency (%) |
| 2 | 66922 | |
| 1 | 6986 | 9.5% |
Most occurring characters
| Value | Count | Frequency (%) |
| 2 | 66922 | |
| 1 | 6986 | 9.5% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 73908 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 66922 | |
| 1 | 6986 | 9.5% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 73908 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 2 | 66922 | |
| 1 | 6986 | 9.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 73908 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 2 | 66922 | |
| 1 | 6986 | 9.5% |
| Distinct | 56331 |
|---|---|
| Distinct (%) | 76.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 577.5 KiB |
| Minimum | 2009-01-01 00:03:25 |
|---|---|
| Maximum | 2021-01-31 23:46:45 |
| Distinct | 56402 |
|---|---|
| Distinct (%) | 76.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 577.5 KiB |
| Minimum | 2009-01-01 00:12:25 |
|---|---|
| Maximum | 2021-01-31 23:57:08 |
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 35733 |
| Missing (%) | 48.3% |
| Memory size | 144.5 KiB |
| False | |
|---|---|
| True | 231 |
| (Missing) |
| Value | Count | Frequency (%) |
| False | 37944 | |
| True | 231 | 0.3% |
| (Missing) | 35733 |
RatecodeID
Categorical
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONMISSING| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 35733 |
| Missing (%) | 48.3% |
| Memory size | 577.5 KiB |
| 1.0 | |
|---|---|
| 5.0 | 756 |
| 2.0 | 29 |
| 4.0 | 28 |
| 3.0 | 4 |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Characters and Unicode
| Total characters | 114,525 |
|---|---|
| Distinct characters | 7 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1.0 |
|---|---|
| 2nd row | 1.0 |
| 3rd row | 1.0 |
| 4th row | 1.0 |
| 5th row | 1.0 |
Common Values
| Value | Count | Frequency (%) |
| 1.0 | 37358 | |
| 5.0 | 756 | 1.0% |
| 2.0 | 29 | < 0.1% |
| 4.0 | 28 | < 0.1% |
| 3.0 | 4 | < 0.1% |
| (Missing) | 35733 |
Length
Category Frequency Plot
| Value | Count | Frequency (%) |
| 1.0 | 37358 | |
| 5.0 | 756 | 2.0% |
| 2.0 | 29 | 0.1% |
| 4.0 | 28 | 0.1% |
| 3.0 | 4 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| . | 38175 | |
| 0 | 38175 | |
| 1 | 37358 | |
| 5 | 756 | 0.7% |
| 2 | 29 | < 0.1% |
| 4 | 28 | < 0.1% |
| 3 | 4 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 76350 | |
| Other Punctuation | 38175 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 38175 | |
| 1 | 37358 | |
| 5 | 756 | 1.0% |
| 2 | 29 | < 0.1% |
| 4 | 28 | < 0.1% |
| 3 | 4 | < 0.1% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 38175 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 114525 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| . | 38175 | |
| 0 | 38175 | |
| 1 | 37358 | |
| 5 | 756 | 0.7% |
| 2 | 29 | < 0.1% |
| 4 | 28 | < 0.1% |
| 3 | 4 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 114525 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| . | 38175 | |
| 0 | 38175 | |
| 1 | 37358 | |
| 5 | 756 | 0.7% |
| 2 | 29 | < 0.1% |
| 4 | 28 | < 0.1% |
| 3 | 4 | < 0.1% |
PULocationID
Real number (ℝ≥0)
| Distinct | 250 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 109.0945094 |
| Minimum | 3 |
|---|---|
| Maximum | 265 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 577.5 KiB |
Quantile statistics
| Minimum | 3 |
|---|---|
| 5-th percentile | 22 |
| Q1 | 52 |
| median | 76 |
| Q3 | 166 |
| 95-th percentile | 244 |
| Maximum | 265 |
| Range | 262 |
| Interquartile range (IQR) | 114 |
Descriptive statistics
| Standard deviation | 70.85340385 |
|---|---|
| Coefficient of variation (CV) | 0.649468101 |
| Kurtosis | -0.7941199081 |
| Mean | 109.0945094 |
| Median Absolute Deviation (MAD) | 35 |
| Skewness | 0.6672850804 |
| Sum | 8062957 |
| Variance | 5020.204837 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 74 | 6555 | 8.9% |
| 75 | 6063 | 8.2% |
| 41 | 4071 | 5.5% |
| 42 | 2683 | 3.6% |
| 244 | 2554 | 3.5% |
| 95 | 2006 | 2.7% |
| 97 | 1927 | 2.6% |
| 166 | 1889 | 2.6% |
| 65 | 1325 | 1.8% |
| 43 | 1320 | 1.8% |
| Other values (240) | 43515 |
| Value | Count | Frequency (%) |
| 3 | 194 | 0.3% |
| 4 | 27 | < 0.1% |
| 7 | 1171 | |
| 8 | 1 | < 0.1% |
| 9 | 82 | 0.1% |
| 10 | 273 | 0.4% |
| 11 | 96 | 0.1% |
| 12 | 1 | < 0.1% |
| 13 | 9 | < 0.1% |
| 14 | 403 | 0.5% |
| Value | Count | Frequency (%) |
| 265 | 119 | |
| 264 | 32 | < 0.1% |
| 263 | 127 | |
| 262 | 25 | < 0.1% |
| 261 | 9 | < 0.1% |
| 260 | 216 | |
| 259 | 149 | |
| 258 | 116 | |
| 257 | 85 | 0.1% |
| 256 | 66 | 0.1% |
DOLocationID
Real number (ℝ≥0)
| Distinct | 256 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 130.4430508 |
| Minimum | 1 |
|---|---|
| Maximum | 265 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 577.5 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 24 |
| Q1 | 65 |
| median | 129 |
| Q3 | 197 |
| 95-th percentile | 248 |
| Maximum | 265 |
| Range | 264 |
| Interquartile range (IQR) | 132 |
Descriptive statistics
| Standard deviation | 76.94977646 |
|---|---|
| Coefficient of variation (CV) | 0.589910892 |
| Kurtosis | -1.310876723 |
| Mean | 130.4430508 |
| Median Absolute Deviation (MAD) | 67 |
| Skewness | 0.1850186625 |
| Sum | 9640785 |
| Variance | 5921.268098 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 74 | 3042 | 4.1% |
| 75 | 2716 | 3.7% |
| 42 | 2598 | 3.5% |
| 41 | 2164 | 2.9% |
| 236 | 1417 | 1.9% |
| 61 | 1351 | 1.8% |
| 238 | 1329 | 1.8% |
| 166 | 1283 | 1.7% |
| 244 | 1104 | 1.5% |
| 263 | 1094 | 1.5% |
| Other values (246) | 55810 |
| Value | Count | Frequency (%) |
| 1 | 7 | < 0.1% |
| 2 | 7 | < 0.1% |
| 3 | 151 | 0.2% |
| 4 | 91 | 0.1% |
| 6 | 1 | < 0.1% |
| 7 | 627 | |
| 8 | 2 | < 0.1% |
| 9 | 87 | 0.1% |
| 10 | 353 | |
| 11 | 87 | 0.1% |
| Value | Count | Frequency (%) |
| 265 | 259 | 0.4% |
| 264 | 65 | 0.1% |
| 263 | 1094 | |
| 262 | 464 | |
| 261 | 72 | 0.1% |
| 260 | 303 | 0.4% |
| 259 | 163 | 0.2% |
| 258 | 151 | 0.2% |
| 257 | 128 | 0.2% |
| 256 | 124 | 0.2% |
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 35733 |
| Missing (%) | 48.3% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.196123117 |
| Minimum | 0 |
|---|---|
| Maximum | 6 |
| Zeros | 110 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 577.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 2 |
| Maximum | 6 |
| Range | 6 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.7626378212 |
|---|---|
| Coefficient of variation (CV) | 0.6375914069 |
| Kurtosis | 22.01311368 |
| Mean | 1.196123117 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 4.609093394 |
| Sum | 45662 |
| Variance | 0.5816164463 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 34379 | |
| 2 | 2227 | 3.0% |
| 5 | 683 | 0.9% |
| 3 | 314 | 0.4% |
| 6 | 312 | 0.4% |
| 4 | 150 | 0.2% |
| 0 | 110 | 0.1% |
| (Missing) | 35733 |
| Value | Count | Frequency (%) |
| 0 | 110 | 0.1% |
| 1 | 34379 | |
| 2 | 2227 | 3.0% |
| 3 | 314 | 0.4% |
| 4 | 150 | 0.2% |
| 5 | 683 | 0.9% |
| 6 | 312 | 0.4% |
| Value | Count | Frequency (%) |
| 6 | 312 | 0.4% |
| 5 | 683 | 0.9% |
| 4 | 150 | 0.2% |
| 3 | 314 | 0.4% |
| 2 | 2227 | 3.0% |
| 1 | 34379 | |
| 0 | 110 | 0.1% |
| Distinct | 2794 |
|---|---|
| Distinct (%) | 3.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 42.0477196 |
| Minimum | 0 |
|---|---|
| Maximum | 244152.01 |
| Zeros | 1626 |
| Zeros (%) | 2.2% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 577.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.5 |
| Q1 | 1.34 |
| median | 2.6 |
| Q3 | 5.68 |
| 95-th percentile | 15.36 |
| Maximum | 244152.01 |
| Range | 244152.01 |
| Interquartile range (IQR) | 4.34 |
Descriptive statistics
| Standard deviation | 1958.08235 |
|---|---|
| Coefficient of variation (CV) | 46.56809856 |
| Kurtosis | 6401.155255 |
| Mean | 42.0477196 |
| Median Absolute Deviation (MAD) | 1.59 |
| Skewness | 72.23375746 |
| Sum | 3107662.86 |
| Variance | 3834086.49 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 1626 | 2.2% |
| 1.3 | 442 | 0.6% |
| 1.4 | 430 | 0.6% |
| 1.2 | 417 | 0.6% |
| 1.1 | 406 | 0.5% |
| 1 | 395 | 0.5% |
| 0.9 | 375 | 0.5% |
| 1.5 | 365 | 0.5% |
| 1.7 | 353 | 0.5% |
| 0.8 | 337 | 0.5% |
| Other values (2784) | 68762 |
| Value | Count | Frequency (%) |
| 0 | 1626 | |
| 0.01 | 27 | < 0.1% |
| 0.02 | 11 | < 0.1% |
| 0.03 | 15 | < 0.1% |
| 0.04 | 15 | < 0.1% |
| 0.05 | 13 | < 0.1% |
| 0.06 | 16 | < 0.1% |
| 0.07 | 8 | < 0.1% |
| 0.08 | 10 | < 0.1% |
| 0.09 | 8 | < 0.1% |
| Value | Count | Frequency (%) |
| 244152.01 | 1 | |
| 182840.32 | 1 | |
| 150672.01 | 1 | |
| 144972.47 | 1 | |
| 144948.19 | 1 | |
| 129402.5 | 1 | |
| 114653.63 | 1 | |
| 105286.83 | 1 | |
| 76058.91 | 1 | |
| 69045.73 | 1 |
| Distinct | 3277 |
|---|---|
| Distinct (%) | 4.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 19.51612789 |
| Minimum | -280 |
|---|---|
| Maximum | 280 |
| Zeros | 76 |
| Zeros (%) | 0.1% |
| Negative | 72 |
| Negative (%) | 0.1% |
| Memory size | 577.5 KiB |
Quantile statistics
| Minimum | -280 |
|---|---|
| 5-th percentile | 5 |
| Q1 | 9 |
| median | 16.83 |
| Q3 | 25.21 |
| 95-th percentile | 46.88 |
| Maximum | 280 |
| Range | 560 |
| Interquartile range (IQR) | 16.21 |
Descriptive statistics
| Standard deviation | 13.45626445 |
|---|---|
| Coefficient of variation (CV) | 0.6894945826 |
| Kurtosis | 8.635783862 |
| Mean | 19.51612789 |
| Median Absolute Deviation (MAD) | 7.83 |
| Skewness | 1.409656861 |
| Sum | 1442397.98 |
| Variance | 181.071053 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 7 | 2204 | 3.0% |
| 6.5 | 1930 | 2.6% |
| 6 | 1883 | 2.5% |
| 8 | 1753 | 2.4% |
| 7.5 | 1705 | 2.3% |
| 5.5 | 1679 | 2.3% |
| 5 | 1513 | 2.0% |
| 8.5 | 1511 | 2.0% |
| 9 | 1431 | 1.9% |
| 10 | 1244 | 1.7% |
| Other values (3267) | 57055 |
| Value | Count | Frequency (%) |
| -280 | 1 | < 0.1% |
| -120 | 1 | < 0.1% |
| -52 | 1 | < 0.1% |
| -33.87 | 1 | < 0.1% |
| -28 | 1 | < 0.1% |
| -25 | 3 | |
| -21.16 | 1 | < 0.1% |
| -15.98 | 1 | < 0.1% |
| -15 | 1 | < 0.1% |
| -13.44 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 280 | 1 | < 0.1% |
| 171 | 1 | < 0.1% |
| 166 | 1 | < 0.1% |
| 150 | 1 | < 0.1% |
| 125 | 1 | < 0.1% |
| 120.5 | 1 | < 0.1% |
| 120 | 3 | |
| 111 | 1 | < 0.1% |
| 110 | 1 | < 0.1% |
| 106.91 | 1 | < 0.1% |
| Distinct | 15 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.7556010175 |
| Minimum | -5.5 |
|---|---|
| Maximum | 8.25 |
| Zeros | 45658 |
| Zeros (%) | 61.8% |
| Negative | 29 |
| Negative (%) | < 0.1% |
| Memory size | 577.5 KiB |
Quantile statistics
| Minimum | -5.5 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 1 |
| 95-th percentile | 2.75 |
| Maximum | 8.25 |
| Range | 13.75 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1.19422487 |
|---|---|
| Coefficient of variation (CV) | 1.58049664 |
| Kurtosis | 1.290200132 |
| Mean | 0.7556010175 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 1.462910674 |
| Sum | 55844.96 |
| Variance | 1.426173039 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 45658 | |
| 2.75 | 14409 | 19.5% |
| 1 | 7156 | 9.7% |
| 0.5 | 5027 | 6.8% |
| 5.5 | 687 | 0.9% |
| 3.75 | 393 | 0.5% |
| 3.25 | 261 | 0.4% |
| 1.35 | 258 | 0.3% |
| 4.09 | 24 | < 0.1% |
| -0.5 | 16 | < 0.1% |
| Other values (5) | 19 | < 0.1% |
| Value | Count | Frequency (%) |
| -5.5 | 1 | < 0.1% |
| -2.75 | 3 | < 0.1% |
| -1 | 9 | < 0.1% |
| -0.5 | 16 | < 0.1% |
| 0 | 45658 | |
| 0.5 | 5027 | 6.8% |
| 1 | 7156 | 9.7% |
| 1.35 | 258 | 0.3% |
| 2.75 | 14409 | 19.5% |
| 3.25 | 261 | 0.4% |
| Value | Count | Frequency (%) |
| 8.25 | 2 | < 0.1% |
| 5.5 | 687 | 0.9% |
| 4.5 | 4 | < 0.1% |
| 4.09 | 24 | < 0.1% |
| 3.75 | 393 | 0.5% |
| 3.25 | 261 | 0.4% |
| 2.75 | 14409 | |
| 1.35 | 258 | 0.3% |
| 1 | 7156 | |
| 0.5 | 5027 | 6.8% |
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 577.5 KiB |
| 0.5 | |
|---|---|
| 0.0 | |
| -0.5 | 59 |
Length
| Max length | 4 |
|---|---|
| Median length | 3 |
| Mean length | 3.00079829 |
| Min length | 3 |
Characters and Unicode
| Total characters | 221,783 |
|---|---|
| Distinct characters | 4 |
| Distinct categories | 3 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0.5 |
|---|---|
| 2nd row | 0.5 |
| 3rd row | 0.5 |
| 4th row | 0.5 |
| 5th row | 0.5 |
Common Values
| Value | Count | Frequency (%) |
| 0.5 | 37463 | |
| 0.0 | 36386 | |
| -0.5 | 59 | 0.1% |
Length
Category Frequency Plot
| Value | Count | Frequency (%) |
| 0.5 | 37522 | |
| 0.0 | 36386 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 110294 | |
| . | 73908 | |
| 5 | 37522 | 16.9% |
| - | 59 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 147816 | |
| Other Punctuation | 73908 | |
| Dash Punctuation | 59 | < 0.1% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 110294 | |
| 5 | 37522 | 25.4% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 73908 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 59 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 221783 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 110294 | |
| . | 73908 | |
| 5 | 37522 | 16.9% |
| - | 59 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 221783 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 110294 | |
| . | 73908 | |
| 5 | 37522 | 16.9% |
| - | 59 | < 0.1% |
| Distinct | 967 |
|---|---|
| Distinct (%) | 1.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.509278292 |
| Minimum | -9.45 |
|---|---|
| Maximum | 110 |
| Zeros | 33957 |
| Zeros (%) | 45.9% |
| Negative | 4 |
| Negative (%) | < 0.1% |
| Memory size | 577.5 KiB |
Quantile statistics
| Minimum | -9.45 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 1.26 |
| Q3 | 2.75 |
| 95-th percentile | 4 |
| Maximum | 110 |
| Range | 119.45 |
| Interquartile range (IQR) | 2.75 |
Descriptive statistics
| Standard deviation | 1.774128974 |
|---|---|
| Coefficient of variation (CV) | 1.175481675 |
| Kurtosis | 203.2600878 |
| Mean | 1.509278292 |
| Median Absolute Deviation (MAD) | 1.26 |
| Skewness | 4.937682598 |
| Sum | 111547.74 |
| Variance | 3.147533618 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 33957 | |
| 2.75 | 20547 | |
| 1 | 1307 | 1.8% |
| 2 | 1213 | 1.6% |
| 3 | 526 | 0.7% |
| 1.66 | 322 | 0.4% |
| 1.56 | 311 | 0.4% |
| 1.46 | 296 | 0.4% |
| 1.36 | 274 | 0.4% |
| 1.96 | 253 | 0.3% |
| Other values (957) | 14902 |
| Value | Count | Frequency (%) |
| -9.45 | 1 | < 0.1% |
| -1.14 | 3 | < 0.1% |
| 0 | 33957 | |
| 0.01 | 64 | 0.1% |
| 0.02 | 8 | < 0.1% |
| 0.03 | 8 | < 0.1% |
| 0.04 | 4 | < 0.1% |
| 0.05 | 7 | < 0.1% |
| 0.06 | 2 | < 0.1% |
| 0.07 | 3 | < 0.1% |
| Value | Count | Frequency (%) |
| 110 | 1 | |
| 42 | 1 | |
| 38 | 1 | |
| 31.2 | 1 | |
| 30 | 1 | |
| 25 | 1 | |
| 24.09 | 1 | |
| 24.06 | 1 | |
| 24 | 1 | |
| 20.55 | 1 |
| Distinct | 26 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.5169471505 |
| Minimum | 0 |
|---|---|
| Maximum | 31.25 |
| Zeros | 67701 |
| Zeros (%) | 91.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 577.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 6.12 |
| Maximum | 31.25 |
| Range | 31.25 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 1.801222874 |
|---|---|
| Coefficient of variation (CV) | 3.484346266 |
| Kurtosis | 20.60496277 |
| Mean | 0.5169471505 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 3.9283023 |
| Sum | 38206.53 |
| Variance | 3.244403841 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 67701 | |
| 6.12 | 5345 | 7.2% |
| 2.29 | 273 | 0.4% |
| 2.8 | 241 | 0.3% |
| 12.24 | 207 | 0.3% |
| 8.41 | 51 | 0.1% |
| 11.75 | 20 | < 0.1% |
| 4.58 | 12 | < 0.1% |
| 27.5 | 9 | < 0.1% |
| 17.87 | 8 | < 0.1% |
| Other values (16) | 41 | 0.1% |
| Value | Count | Frequency (%) |
| 0 | 67701 | |
| 2 | 2 | < 0.1% |
| 2.29 | 273 | 0.4% |
| 2.8 | 241 | 0.3% |
| 4.58 | 12 | < 0.1% |
| 4.75 | 1 | < 0.1% |
| 5.6 | 1 | < 0.1% |
| 6.12 | 5345 | 7.2% |
| 8 | 3 | < 0.1% |
| 8.41 | 51 | 0.1% |
| Value | Count | Frequency (%) |
| 31.25 | 1 | < 0.1% |
| 27.5 | 9 | |
| 23.5 | 2 | < 0.1% |
| 19.87 | 7 | |
| 18.36 | 2 | < 0.1% |
| 17.87 | 8 | |
| 16.82 | 1 | < 0.1% |
| 16.33 | 1 | < 0.1% |
| 16.12 | 1 | < 0.1% |
| 15.04 | 1 | < 0.1% |
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 577.5 KiB |
| 0.3 | |
|---|---|
| 0.0 | 119 |
| -0.3 | 66 |
Length
| Max length | 4 |
|---|---|
| Median length | 3 |
| Mean length | 3.000893002 |
| Min length | 3 |
Characters and Unicode
| Total characters | 221,790 |
|---|---|
| Distinct characters | 4 |
| Distinct categories | 3 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0.3 |
|---|---|
| 2nd row | 0.3 |
| 3rd row | 0.3 |
| 4th row | 0.3 |
| 5th row | 0.3 |
Common Values
| Value | Count | Frequency (%) |
| 0.3 | 73723 | |
| 0.0 | 119 | 0.2% |
| -0.3 | 66 | 0.1% |
Length
Category Frequency Plot
| Value | Count | Frequency (%) |
| 0.3 | 73789 | |
| 0.0 | 119 | 0.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 74027 | |
| . | 73908 | |
| 3 | 73789 | |
| - | 66 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 147816 | |
| Other Punctuation | 73908 | |
| Dash Punctuation | 66 | < 0.1% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 74027 | |
| 3 | 73789 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 73908 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 66 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 221790 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 74027 | |
| . | 73908 | |
| 3 | 73789 | |
| - | 66 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 221790 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 74027 | |
| . | 73908 | |
| 3 | 73789 | |
| - | 66 | < 0.1% |
| Distinct | 3746 |
|---|---|
| Distinct (%) | 5.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 23.14664597 |
| Minimum | -280.3 |
|---|---|
| Maximum | 280.3 |
| Zeros | 70 |
| Zeros (%) | 0.1% |
| Negative | 72 |
| Negative (%) | 0.1% |
| Memory size | 577.5 KiB |
Quantile statistics
| Minimum | -280.3 |
|---|---|
| 5-th percentile | 6.8 |
| Q1 | 12 |
| median | 20.64 |
| Q3 | 29 |
| 95-th percentile | 53.47 |
| Maximum | 280.3 |
| Range | 560.6 |
| Interquartile range (IQR) | 17 |
Descriptive statistics
| Standard deviation | 14.80600905 |
|---|---|
| Coefficient of variation (CV) | 0.6396611011 |
| Kurtosis | 6.822531653 |
| Mean | 23.14664597 |
| Median Absolute Deviation (MAD) | 8.58 |
| Skewness | 1.376921049 |
| Sum | 1710722.31 |
| Variance | 219.2179039 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 7.8 | 1065 | 1.4% |
| 19.78 | 1061 | 1.4% |
| 7.3 | 1023 | 1.4% |
| 8.3 | 1009 | 1.4% |
| 6.8 | 997 | 1.3% |
| 8.8 | 944 | 1.3% |
| 6.3 | 852 | 1.2% |
| 18.5 | 842 | 1.1% |
| 9.3 | 795 | 1.1% |
| 9.8 | 790 | 1.1% |
| Other values (3736) | 64530 |
| Value | Count | Frequency (%) |
| -280.3 | 1 | < 0.1% |
| -120.3 | 1 | < 0.1% |
| -52.8 | 1 | < 0.1% |
| -42.52 | 1 | < 0.1% |
| -28.3 | 1 | < 0.1% |
| -26.36 | 1 | < 0.1% |
| -25.3 | 3 | |
| -15.89 | 1 | < 0.1% |
| -15.3 | 1 | < 0.1% |
| -14.27 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 280.3 | 1 | |
| 175 | 1 | |
| 166.3 | 1 | |
| 154.17 | 1 | |
| 151.3 | 1 | |
| 144.36 | 1 | |
| 128.17 | 1 | |
| 124.04 | 1 | |
| 123.25 | 1 | |
| 121.8 | 1 |
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 35733 |
| Missing (%) | 48.3% |
| Memory size | 577.5 KiB |
| 1.0 | |
|---|---|
| 2.0 | |
| 3.0 | 157 |
| 4.0 | 50 |
| 5.0 | 1 |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Characters and Unicode
| Total characters | 114,525 |
|---|---|
| Distinct characters | 7 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 1 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | 2.0 |
|---|---|
| 2nd row | 1.0 |
| 3rd row | 1.0 |
| 4th row | 2.0 |
| 5th row | 1.0 |
Common Values
| Value | Count | Frequency (%) |
| 1.0 | 23041 | |
| 2.0 | 14926 | |
| 3.0 | 157 | 0.2% |
| 4.0 | 50 | 0.1% |
| 5.0 | 1 | < 0.1% |
| (Missing) | 35733 |
Length
Category Frequency Plot
| Value | Count | Frequency (%) |
| 1.0 | 23041 | |
| 2.0 | 14926 | |
| 3.0 | 157 | 0.4% |
| 4.0 | 50 | 0.1% |
| 5.0 | 1 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| . | 38175 | |
| 0 | 38175 | |
| 1 | 23041 | |
| 2 | 14926 | 13.0% |
| 3 | 157 | 0.1% |
| 4 | 50 | < 0.1% |
| 5 | 1 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 76350 | |
| Other Punctuation | 38175 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 38175 | |
| 1 | 23041 | |
| 2 | 14926 | 19.5% |
| 3 | 157 | 0.2% |
| 4 | 50 | 0.1% |
| 5 | 1 | < 0.1% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 38175 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 114525 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| . | 38175 | |
| 0 | 38175 | |
| 1 | 23041 | |
| 2 | 14926 | 13.0% |
| 3 | 157 | 0.1% |
| 4 | 50 | < 0.1% |
| 5 | 1 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 114525 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| . | 38175 | |
| 0 | 38175 | |
| 1 | 23041 | |
| 2 | 14926 | 13.0% |
| 3 | 157 | 0.1% |
| 4 | 50 | < 0.1% |
| 5 | 1 | < 0.1% |
trip_type
Categorical
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONMISSING| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 35733 |
| Missing (%) | 48.3% |
| Memory size | 577.5 KiB |
| 1.0 | |
|---|---|
| 2.0 | 640 |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Characters and Unicode
| Total characters | 114,525 |
|---|---|
| Distinct characters | 4 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1.0 |
|---|---|
| 2nd row | 1.0 |
| 3rd row | 1.0 |
| 4th row | 1.0 |
| 5th row | 1.0 |
Common Values
| Value | Count | Frequency (%) |
| 1.0 | 37535 | |
| 2.0 | 640 | 0.9% |
| (Missing) | 35733 |
Length
Category Frequency Plot
| Value | Count | Frequency (%) |
| 1.0 | 37535 | |
| 2.0 | 640 | 1.7% |
Most occurring characters
| Value | Count | Frequency (%) |
| . | 38175 | |
| 0 | 38175 | |
| 1 | 37535 | |
| 2 | 640 | 0.6% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 76350 | |
| Other Punctuation | 38175 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 38175 | |
| 1 | 37535 | |
| 2 | 640 | 0.8% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 38175 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 114525 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| . | 38175 | |
| 0 | 38175 | |
| 1 | 37535 | |
| 2 | 640 | 0.6% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 114525 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| . | 38175 | |
| 0 | 38175 | |
| 1 | 37535 | |
| 2 | 640 | 0.6% |
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 35733 |
| Missing (%) | 48.3% |
| Memory size | 577.5 KiB |
| 0.0 | |
|---|---|
| 2.75 | |
| 2.5 | 5 |
Length
| Max length | 4 |
|---|---|
| Median length | 3 |
| Mean length | 3.234237066 |
| Min length | 3 |
Characters and Unicode
| Total characters | 123,467 |
|---|---|
| Distinct characters | 5 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0.0 |
|---|---|
| 2nd row | 2.75 |
| 3rd row | 0.0 |
| 4th row | 0.0 |
| 5th row | 0.0 |
Common Values
| Value | Count | Frequency (%) |
| 0.0 | 29228 | |
| 2.75 | 8942 | 12.1% |
| 2.5 | 5 | < 0.1% |
| (Missing) | 35733 |
Length
Category Frequency Plot
| Value | Count | Frequency (%) |
| 0.0 | 29228 | |
| 2.75 | 8942 | 23.4% |
| 2.5 | 5 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 58456 | |
| . | 38175 | |
| 2 | 8947 | 7.2% |
| 5 | 8947 | 7.2% |
| 7 | 8942 | 7.2% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 85292 | |
| Other Punctuation | 38175 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 58456 | |
| 2 | 8947 | 10.5% |
| 5 | 8947 | 10.5% |
| 7 | 8942 | 10.5% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 38175 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 123467 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 58456 | |
| . | 38175 | |
| 2 | 8947 | 7.2% |
| 5 | 8947 | 7.2% |
| 7 | 8942 | 7.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 123467 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 58456 | |
| . | 38175 | |
| 2 | 8947 | 7.2% |
| 5 | 8947 | 7.2% |
| 7 | 8942 | 7.2% |
| Distinct | 3120 |
|---|---|
| Distinct (%) | 4.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 16.85257843 |
| Minimum | 1 |
|---|---|
| Maximum | 60 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 577.5 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 3.816666667 |
| Q1 | 8.05 |
| median | 14 |
| Q3 | 22.63333333 |
| 95-th percentile | 41 |
| Maximum | 60 |
| Range | 59 |
| Interquartile range (IQR) | 14.58333333 |
Descriptive statistics
| Standard deviation | 11.56316304 |
|---|---|
| Coefficient of variation (CV) | 0.6861361357 |
| Kurtosis | 1.078105442 |
| Mean | 16.85257843 |
| Median Absolute Deviation (MAD) | 6.716666667 |
| Skewness | 1.187936546 |
| Sum | 1245540.367 |
| Variance | 133.7067395 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 13 | 1427 | 1.9% |
| 10 | 1425 | 1.9% |
| 15 | 1424 | 1.9% |
| 11 | 1421 | 1.9% |
| 12 | 1405 | 1.9% |
| 14 | 1374 | 1.9% |
| 9 | 1327 | 1.8% |
| 16 | 1277 | 1.7% |
| 8 | 1233 | 1.7% |
| 17 | 1207 | 1.6% |
| Other values (3110) | 60388 |
| Value | Count | Frequency (%) |
| 1 | 75 | |
| 1.016666667 | 15 | < 0.1% |
| 1.033333333 | 10 | < 0.1% |
| 1.05 | 10 | < 0.1% |
| 1.066666667 | 9 | < 0.1% |
| 1.083333333 | 11 | < 0.1% |
| 1.1 | 13 | < 0.1% |
| 1.116666667 | 6 | < 0.1% |
| 1.133333333 | 14 | < 0.1% |
| 1.15 | 8 | < 0.1% |
| Value | Count | Frequency (%) |
| 60 | 36 | |
| 59.98333333 | 2 | < 0.1% |
| 59.95 | 2 | < 0.1% |
| 59.93333333 | 1 | < 0.1% |
| 59.9 | 1 | < 0.1% |
| 59.86666667 | 1 | < 0.1% |
| 59.85 | 1 | < 0.1% |
| 59.83333333 | 1 | < 0.1% |
| 59.81666667 | 1 | < 0.1% |
| 59.78333333 | 1 | < 0.1% |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
| df_index | VendorID | lpep_pickup_datetime | lpep_dropoff_datetime | store_and_fwd_flag | RatecodeID | PULocationID | DOLocationID | passenger_count | trip_distance | fare_amount | extra | mta_tax | tip_amount | tolls_amount | ehail_fee | improvement_surcharge | total_amount | payment_type | trip_type | congestion_surcharge | duration | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 2 | 2021-01-01 00:15:56 | 2021-01-01 00:19:52 | N | 1.0 | 43 | 151 | 1.0 | 1.01 | 5.5 | 0.5 | 0.5 | 0.00 | 0.0 | None | 0.3 | 6.80 | 2.0 | 1.0 | 0.00 | 3.933333 |
| 1 | 1 | 2 | 2021-01-01 00:25:59 | 2021-01-01 00:34:44 | N | 1.0 | 166 | 239 | 1.0 | 2.53 | 10.0 | 0.5 | 0.5 | 2.81 | 0.0 | None | 0.3 | 16.86 | 1.0 | 1.0 | 2.75 | 8.750000 |
| 2 | 2 | 2 | 2021-01-01 00:45:57 | 2021-01-01 00:51:55 | N | 1.0 | 41 | 42 | 1.0 | 1.12 | 6.0 | 0.5 | 0.5 | 1.00 | 0.0 | None | 0.3 | 8.30 | 1.0 | 1.0 | 0.00 | 5.966667 |
| 3 | 3 | 2 | 2020-12-31 23:57:51 | 2021-01-01 00:04:56 | N | 1.0 | 168 | 75 | 1.0 | 1.99 | 8.0 | 0.5 | 0.5 | 0.00 | 0.0 | None | 0.3 | 9.30 | 2.0 | 1.0 | 0.00 | 7.083333 |
| 4 | 7 | 2 | 2021-01-01 00:26:31 | 2021-01-01 00:28:50 | N | 1.0 | 75 | 75 | 6.0 | 0.45 | 3.5 | 0.5 | 0.5 | 0.96 | 0.0 | None | 0.3 | 5.76 | 1.0 | 1.0 | 0.00 | 2.316667 |
| 5 | 9 | 2 | 2021-01-01 00:58:32 | 2021-01-01 01:32:34 | N | 1.0 | 225 | 265 | 1.0 | 12.19 | 38.0 | 0.5 | 0.5 | 2.75 | 0.0 | None | 0.3 | 42.05 | 1.0 | 1.0 | 0.00 | 34.033333 |
| 6 | 10 | 2 | 2021-01-01 00:31:14 | 2021-01-01 00:55:07 | N | 1.0 | 244 | 244 | 2.0 | 3.39 | 18.0 | 0.5 | 0.5 | 0.00 | 0.0 | None | 0.3 | 19.30 | 2.0 | 1.0 | 0.00 | 23.883333 |
| 7 | 11 | 2 | 2021-01-01 00:08:50 | 2021-01-01 00:21:56 | N | 1.0 | 75 | 213 | 1.0 | 6.69 | 19.5 | 0.5 | 0.5 | 0.00 | 0.0 | None | 0.3 | 20.80 | 2.0 | 1.0 | 0.00 | 13.100000 |
| 8 | 12 | 2 | 2021-01-01 00:35:13 | 2021-01-01 00:44:44 | N | 1.0 | 74 | 238 | 1.0 | 2.34 | 10.0 | 0.5 | 0.5 | 0.00 | 0.0 | None | 0.3 | 14.05 | 1.0 | 1.0 | 2.75 | 9.516667 |
| 9 | 13 | 2 | 2021-01-01 00:39:57 | 2021-01-01 00:55:25 | N | 1.0 | 74 | 60 | 1.0 | 5.48 | 18.0 | 0.5 | 0.5 | 0.00 | 0.0 | None | 0.3 | 19.30 | 2.0 | 1.0 | 0.00 | 15.466667 |
Last rows
| df_index | VendorID | lpep_pickup_datetime | lpep_dropoff_datetime | store_and_fwd_flag | RatecodeID | PULocationID | DOLocationID | passenger_count | trip_distance | fare_amount | extra | mta_tax | tip_amount | tolls_amount | ehail_fee | improvement_surcharge | total_amount | payment_type | trip_type | congestion_surcharge | duration | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 73898 | 76508 | 2 | 2021-01-31 20:17:00 | 2021-01-31 20:35:00 | None | NaN | 108 | 210 | NaN | 5.05 | 25.95 | 2.75 | 0.0 | 0.0 | 0.00 | None | 0.3 | 29.00 | NaN | NaN | NaN | 18.0 |
| 73899 | 76509 | 2 | 2021-01-31 20:23:00 | 2021-01-31 20:41:00 | None | NaN | 60 | 254 | NaN | 5.33 | 29.45 | 2.75 | 0.0 | 0.0 | 0.00 | None | 0.3 | 32.50 | NaN | NaN | NaN | 18.0 |
| 73900 | 76510 | 2 | 2021-01-31 21:09:00 | 2021-01-31 21:23:00 | None | NaN | 174 | 213 | NaN | 5.18 | 29.45 | 2.75 | 0.0 | 0.0 | 0.00 | None | 0.3 | 32.50 | NaN | NaN | NaN | 14.0 |
| 73901 | 76511 | 2 | 2021-01-31 21:33:00 | 2021-01-31 22:18:00 | None | NaN | 136 | 225 | NaN | 17.13 | 71.83 | 2.75 | 0.0 | 0.0 | 6.12 | None | 0.3 | 81.00 | NaN | NaN | NaN | 45.0 |
| 73902 | 76512 | 2 | 2021-01-31 21:58:00 | 2021-01-31 22:47:00 | None | NaN | 218 | 41 | NaN | 18.18 | 56.86 | 2.75 | 0.0 | 0.0 | 6.12 | None | 0.3 | 66.03 | NaN | NaN | NaN | 49.0 |
| 73903 | 76513 | 2 | 2021-01-31 21:38:00 | 2021-01-31 22:16:00 | None | NaN | 81 | 90 | NaN | 17.63 | 56.23 | 2.75 | 0.0 | 0.0 | 6.12 | None | 0.3 | 65.40 | NaN | NaN | NaN | 38.0 |
| 73904 | 76514 | 2 | 2021-01-31 22:43:00 | 2021-01-31 23:21:00 | None | NaN | 35 | 213 | NaN | 18.36 | 46.66 | 0.00 | 0.0 | 12.2 | 6.12 | None | 0.3 | 65.28 | NaN | NaN | NaN | 38.0 |
| 73905 | 76515 | 2 | 2021-01-31 22:16:00 | 2021-01-31 22:27:00 | None | NaN | 74 | 69 | NaN | 2.50 | 18.95 | 2.75 | 0.0 | 0.0 | 0.00 | None | 0.3 | 22.00 | NaN | NaN | NaN | 11.0 |
| 73906 | 76516 | 2 | 2021-01-31 23:10:00 | 2021-01-31 23:37:00 | None | NaN | 168 | 215 | NaN | 14.48 | 48.87 | 2.75 | 0.0 | 0.0 | 6.12 | None | 0.3 | 58.04 | NaN | NaN | NaN | 27.0 |
| 73907 | 76517 | 2 | 2021-01-31 23:25:00 | 2021-01-31 23:35:00 | None | NaN | 119 | 244 | NaN | 1.81 | 15.45 | 2.75 | 0.0 | 0.0 | 0.00 | None | 0.3 | 18.50 | NaN | NaN | NaN | 10.0 |